"Convex Until Proven Guilty": Dimension-Free Acceleration of Gradient Descent on Non-Convex Functions
نویسندگان
چکیده
We develop and analyze a variant of Nesterov’s accelerated gradient descent (AGD) for minimization of smooth non-convex functions. We prove that one of two cases occurs: either our AGD variant converges quickly, as if the function was convex, or we produce a certificate that the function is “guilty” of being non-convex. This non-convexity certificate allows us to exploit negative curvature and obtain deterministic, dimension-free acceleration of convergence for non-convex functions. For a function f with Lipschitz continuous gradient and Hessian, we compute a point x with krf(x)k ✏ in O(✏ 7/4 log(1/✏)) gradient and function evaluations. Assuming additionally that the third derivative is Lipschitz, we require only O(✏ 5/3 log(1/✏)) evaluations.
منابع مشابه
Regret bounds for Non Convex Quadratic Losses Online Learning over Reproducing Kernel Hilbert Spaces
We present several online algorithms with dimension-free regret bounds for general nonconvex quadratic losses by viewing them as functions in Reproducing Hilbert Kernel Spaces. In our work we adapt the Online Gradient Descent, Follow the Regularized Leader and the Conditional Gradient method meta algorithms for RKHS spaces and provide regret bounds in this setting. By analyzing them as algorith...
متن کاملAccelerating Asynchronous Algorithms for Convex Optimization by Momentum Compensation
Asynchronous algorithms have attracted much attention recently due to the crucial demands on solving large-scale optimization problems. However, the accelerated versions of asynchronous algorithms are rarely studied. In this paper, we propose the “momentum compensation” technique to accelerate asynchronous algorithms for convex problems. Specifically, we first accelerate the plain Asynchronous ...
متن کاملConvergence Rate of Sign Stochastic Gradient Descent for Non-convex Functions
The sign stochastic gradient descent method (signSGD) utilises only the sign of the stochastic gradient in its updates. For deep networks, this one-bit quantisation has surprisingly little impact on convergence speed or generalisation performance compared to SGD. Since signSGD is effectively compressing the gradients, it is very relevant for distributed optimisation where gradients need to be a...
متن کاملOn Steepest Descent Algorithms for Discrete Convex Functions
This paper investigates the complexity of steepest descent algorithms for two classes of discrete convex functions, M-convex functions and L-convex functions. Simple tie-breaking rules yield complexity bounds that are polynomials in the dimension of the variables and the size of the effective domain. Combination of the present results with a standard scaling approach leads to an efficient algor...
متن کاملDistributed Optimization of Convex Sum of Non-Convex Functions
We present a distributed solution to optimizing a convex function composed of several nonconvex functions. Each non-convex function is privately stored with an agent while the agents communicate with neighbors to form a network. We show that coupled consensus and projected gradient descent algorithm proposed in [1] can optimize convex sum of non-convex functions under an additional assumption o...
متن کامل